Boosting Generic Visual-Linguistic Representation with Dynamic Contexts
نویسندگان
چکیده
Pretraining large models on generous multi-modal corpora has accelerated the development of visual-linguistic (VL) representation and achieved great success various vision-and-language downstream tasks. Learning these is usually executed by predicting randomly masked words captions or patches in images. Such approaches, nevertheless, seldom explore supervision causalities behind caption descriptions procedure generating events beyond still In this work, we endow pretrained with high-level cognition delving into dynamic contexts to model visual linguistic uniformly. Specifically, format dynamic contexts an image as sentences describing xmlns:xlink="http://www.w3.org/1999/xlink">before , xmlns:xlink="http://www.w3.org/1999/xlink">on xmlns:xlink="http://www.w3.org/1999/xlink">after image. Unlike traditional caption-wise similarity, propose a novel contexts-based similarity (DCS) metric, which correlation potential causes effects besides immediate content are considered measure relevance among DCS can be further simplified parameterizing event continuity relax requirements dense contextual annotations. A new pre-task designed minimize feature distances dynamically relevant images incorporate causality commonsense knowledge VL learning. Models based our significantly outperform typical multiple cross-modal tasks, including conventional reasoning (VCR), question answering (VQA), zero-shot image-text retrieval, extended / ordering
منابع مشابه
Generic Object Recognition with Strangeness and Boosting
People can quickly recognize enormous number of rigid/non-rigid object, such as cars, faces, trees, regardless of the viewpoint, lighting, illumination and local deformation. How to recognize generic object has been a hard problem for long time in psychophysics, neurobiology and computation. Based on the research of psychophysics and neurobiology, human interprets the image scene (label class) ...
متن کاملFighting biases with dynamic boosting
While gradient boosting algorithms are the workhorse of modern industrial machine learning and data science, all current implementations are susceptible to a nontrivial but damaging form of label leakage. It results in a systematic bias in pointwise gradient estimates that lead to reduced accuracy. This paper formally analyzes the issue and presents solutions that produce unbiased pointwise gra...
متن کاملVisual Representation of 3D Language Constructs Specified by Generic Depictions
Several modeling domains make use of three-dimensional representations, e.g., the “ball-and-stick” models of molecules. Our generator framework DEViL3D supports the design and implementation of visual 3D languages for such modeling purposes. The front-end of a language implementation generated by DEViL3D is a dedicated 3D graphical structure editor, which is used to construct programs in that d...
متن کاملMining linguistic tone patterns with symbolic representation
This paper conceptualizes speech prosody data mining and its potential application in data-driven phonology/phonetics research. We first conceptualize Speech Prosody Mining (SPM) in a time-series data mining framework. Specifically, we propose using efficient symbolic representations for speech prosody time-series similarity computation. We experiment with both symbolic and numeric representati...
متن کاملLeveraging k-NN for generic classification boosting
Voting rules relying on k-nearest neighbors (k-NN) are an effective tool in countless many machine learning techniques. Thanks to its simplicity, k-NN classification is very attractive to practitioners, as it enables very good performances in several practical applications. However, it suffers from various drawbacks, like sensitivity to “noisy” instances and poor generalization properties when ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Multimedia
سال: 2023
ISSN: ['1520-9210', '1941-0077']
DOI: https://doi.org/10.1109/tmm.2023.3237164